Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RAC] Rule monitoring: Event Log for Rule Registry #98353

Merged
merged 6 commits into from
May 27, 2021

Conversation

banderror
Copy link
Contributor

@banderror banderror commented Apr 26, 2021

Needed for: rule execution log for Security #94143
Related to:

Summary

This PR adds a mechanism for writing to / reading from / bootstrapping indices for RAC project into the rule_registry plugin. Particularly, indices for alerts-as-data and rule execution events. This implementation is similar to existing implementations like event_log plugin (see #98353 (comment) for historical perspective), but we're going to converge all of them into 1 or 2 implementations. At least we should have a single one in rule_registry itself.

In this PR I tried to incorporate most of the feedback received in the RFC (#98912), but if you notice I missed/forgot something, please let me know in the comments.

Done in this PR:

  • Schema-agnostic APIs for working with Elasticsearch.
  • Schema-aware log definition and bootstrapping API (creating hierarchical logs).
  • Schema-aware write API (logging events).
  • Schema-aware read API (searching logs, filtering, sorting, pagination, aggregation).
  • Support for Kibana spaces, space-aware index bootstrapping (either at rule creation or rule execution time).

As for reviewing this PR, perhaps it might be easier to start with:

Next steps

Next steps towards rule execution log in Security (#94143):

  • define actual schema for rule execution events
  • inject instance of rule execution log into Security rule executors and route handlers
  • implement actual execution logging in rule executors
  • update route handlers to start fetching execution events and metrics from the log instead of custom saved objects

Next steps in the context of RAC and unified implementation:

  • converge this implementation with RuleDataService implementation
    • implement robust index bootstrapping
    • reconsider using FieldMap as a generic type parameter
    • implement validation for documents being indexed
  • cover the final implementation with tests
  • write comprehensive docs: update plugin README, add JSDoc comments to all public interfaces

Checklist

Delete any items that are not applicable to this PR.

  • Documentation was added for features that require explanation or tutorials
  • Unit or functional tests were updated or added to match the most common scenarios
  • If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list

@banderror banderror added Team:Detections and Resp Security Detection Response Team Theme: rac label obsolete labels Apr 26, 2021
@banderror banderror self-assigned this Apr 26, 2021
@banderror banderror force-pushed the event-log-for-rule-registry branch 12 times, most recently from e42dc9d to 1bd2b91 Compare April 30, 2021 11:50
@banderror banderror changed the title [RAC] Event Log for Rule Registry (WIP) [RAC] Rule monitoring: Event Log for Rule Registry (WIP) Apr 30, 2021
@banderror banderror force-pushed the event-log-for-rule-registry branch 6 times, most recently from f97fcd3 to 227f73f Compare May 4, 2021 10:15
@pmuellr
Copy link
Member

pmuellr commented May 4, 2021

Trying to understand a couple of things, including regarding the PR #98935 - RAC -Decouple registry from alerts-as-data client.

  • It seems like the decouple PR will handle the creation / read / write / cleanup operations for the indices - not framework-y, but more of a library style; which would imply we don't need some of that logic here
  • Is there an expectation that every alert type will use these? Or is it more just for security? Or is it that the concept of separating the "alert data" from the "execution data" is what we're actually doing here, and we do want to allow any alert type to use both
  • I think if we use the execlog as envisioned here, we probably won't have much of a need for the existing event log at all, except to track action execution.
  • Why can't we use the current event log as the execlog? Seems like it would be nice to avoid having to have another set of indices to deal with, if we can get by with the event log. From [Security Solution][Detections] Proposal: building Rule Execution Log on top of Event Log and ECS #94143, there's this:
    • What's missing in the current Event Log API
      • extended support for ECS - more fields and field sets, in particular: standard event., log., rule.*
      • custom kibana.detection_engine.*
      • aggregation queries
      • sorting by multiple fields
      • nice-to-have: custom ES DSL filters (term, terms) instead of string KQL filter
      • limiting source fields to return: "_source": ["@timestamp", "message", "log.level", "event.action"]
  • If we end up using the execlog, but we still want to keep the event log around (for some reason), do we still need the changes made to the event log in PR [Event Log] Extend ECS event schema with fields needed for Detection Engine #95067 ? Seems like they won't be used any more.

@banderror banderror force-pushed the event-log-for-rule-registry branch from 6269a30 to 093e238 Compare May 5, 2021 09:25
@gmmorris
Copy link
Contributor

gmmorris commented May 5, 2021

Trying to understand a couple of things, including regarding the PR #98935 - RAC -Decouple registry from alerts-as-data client.

  • It seems like the decouple PR will handle the creation / read / write / cleanup operations for the indices - not framework-y, but more of a library style; which would imply we don't need some of that logic here

  • Is there an expectation that every alert type will use these? Or is it more just for security? Or is it that the concept of separating the "alert data" from the "execution data" is what we're actually doing here, and we do want to allow any alert type to use both

  • I think if we use the execlog as envisioned here, we probably won't have much of a need for the existing event log at all, except to track action execution.

  • Why can't we use the current event log as the execlog? Seems like it would be nice to avoid having to have another set of indices to deal with, if we can get by with the event log. From [Security Solution][Detections] Proposal: building Rule Execution Log on top of Event Log and ECS #94143, there's this:

    • What's missing in the current Event Log API

      • extended support for ECS - more fields and field sets, in particular: standard event., log., rule.*
      • custom kibana.detection_engine.*
      • aggregation queries
      • sorting by multiple fields
      • nice-to-have: custom ES DSL filters (term, terms) instead of string KQL filter
      • limiting source fields to return: "_source": ["@timestamp", "message", "log.level", "event.action"]
  • If we end up using the execlog, but we still want to keep the event log around (for some reason), do we still need the changes made to the event log in PR [Event Log] Extend ECS event schema with fields needed for Detection Engine #95067 ? Seems like they won't be used any more.

@pmuellr @tsg Feels like this is worth adding as a discussion item to the next Alerts at Data sync.... I'm afraid of this getting lost in issue comments :)

@tsg
Copy link
Contributor

tsg commented May 5, 2021

@gmmorris thanks for the ping. I added it to the agenda for May 11th. If we think it's blocking work, we could also meet ad-hoc earlier.

  • I think if we use the execlog as envisioned here, we probably won't have much of a need for the existing event log at all, except to track action execution.
  • Why can't we use the current event log as the execlog? Seems like it would be nice to avoid having to have another set of indices to deal with, if we can get by with the event log.

I think this is a fair question and I am also concerned about potentially diverging here (when/what should I write to eventlog vs execlog).

The argument for the execlog that I heard is that by following the same naming convention as the alert indices, it is easier for the admin to give access to them to the users of the system, so they can monitor the state of their Rules directly.

@spong @dgieselaar @pmuellr Other thoughts on pros/cons here?

@dgieselaar
Copy link
Member

@tsg:

I think this is a fair question and I am also concerned about potentially diverging here (when/what should I write to eventlog vs execlog).

I agree here. I don't really have a strong opinion here, but my preference would be to not have three things (rule registry, execlog, event log). I'd prefer two, max. Not all of these things diverge enough in requirements to warrant maintaining three different implementations, IMHO.

@gmmorris
Copy link
Contributor

gmmorris commented May 5, 2021

The argument for the execlog that I heard is that by following the same naming convention as the alert indices, it is easier for the admin to give access to them to the users of the system, so they can monitor the state of their Rules directly.

This is a curious one for me @tsg .
The goal behind the event log was to provide a curated UI that allows any user to see the execution log of any rule that they have privileges to see.
It would allow any user, out of the box (no need for admin intervention), to view when rules that they can see have executed, what failures may have occurred, when it detected a violation and what actions it has scheduled.

This doesn't feel to me like an index that users need to access directly... do we see much value in them building visualisation over this? 🤔
It seems to me like the added value of having a single index that the framework maintains outweighs the value of user access, but I might be missing something here.

@pmuellr
Copy link
Member

pmuellr commented May 5, 2021

do we see much value in them building visualisation over this?

YES - customers should be able to do all kinds of slicing/dicing of the event log data in viz's and discover.

So we need to make that happen, and there are a number of technical issues with the event log shape we need to change to improve this (hopefully for 7.14).

Longer-term, I'm hoping the "saved objects as data" effort will eventually just turn into "allow me to access an arbitrary index with Kibana RBAC if I set things up just right, probably as a KIP". Which we could then use as a way to provide viz's / discover support for the event log directly, for all customers, with RBAC support. May not end up providing direct ES query support with RBAC, but that's fine because I think viz's and discover get us most of the use cases.

In the meantime, we could look at providing a new built-in role that would allow read-only access to the event log, for admin-type folks, to work around having to make the customer do that themselves. And some ease-of-use to add a KIP for it, etc.

I'm just not sure, if the main difference between the execlog and event log is that the execlog is per-space and event log is global, that we need to deal with that. We still need to handle the case, presumably, of spaces with multiple solutions that you want to have silo'd from users, which puts us more in a execlog per namespace/consumer vs just namespace, and I'm not happy about the potential large number of all these indices ...

@botelastic botelastic bot added the Team:APM All issues that need APM UI Team support label May 18, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:apm)

@kobelb
Copy link
Contributor

kobelb commented May 18, 2021

This sounds like a good idea, even if it's quite rough. There is so much in flight, the conversation is a bit hard to follow and comment intelligently on, understanding what's being suggested in the PR vs what's being asked about in the comments. I think a visual diagram would be a huge, huge help.

+100. I really think we need a single source of truth, that is a living document, for the architecture that is being proposed and how these Elasticsearch indices will be used.

@banderror banderror force-pushed the event-log-for-rule-registry branch from e5b55fe to f9ee1cf Compare May 25, 2021 12:37
@banderror banderror requested a review from a team as a code owner May 25, 2021 12:37
Copy link
Contributor

@smith smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting this error when starting Kibana in dev:

server    log   [12:34:33.909] [error][IndexManagementGateway][eventLog][plugins][ruleRegistry] ResponseError: security_exception
    at onBody (/Users/smith/Code/kibana-review/node_modules/@elastic/elasticsearch/lib/Transport.js:337:23)
    at IncomingMessage.onEnd (/Users/smith/Code/kibana-review/node_modules/@elastic/elasticsearch/lib/Transport.js:264:11)
    at IncomingMessage.emit (events.js:388:22)
    at endReadableNT (internal/streams/readable.js:1336:12)
    at processTicksAndRejections (internal/process/task_queues.js:82:21) {
  meta: {
    body: { error: [Object], status: 403 },
    statusCode: 403,
    headers: {
      'content-length': '577',
      'content-type': 'application/json;charset=utf-8',
      'x-cloud-request-id': 'QuceUv6KRh6-zCcLjQn7ng',
      'x-found-handling-cluster': 'b5caf8c576704714a9bb2559bddab987',
      'x-found-handling-instance': 'instance-0000000028',
      date: 'Tue, 25 May 2021 17:34:33 GMT'
    },
    meta: {
      context: null,
      request: [Object],
      name: 'elasticsearch-js',
      connection: [Object],
      attempts: 0,
      aborted: false
    }
  }
}
server    log   [12:34:33.914] [error][IndexManagementGateway][eventLog][plugins][ruleRegistry] {
  "body": {
    "error": {
      "root_cause": [
        {
          "type": "security_exception",
          "reason": "action [indices:admin/create] is unauthorized for user [kibana_system_user] with roles [kibana_system] on indices [.alerts-security.alerts-default-000001], this action is granted by the index privileges [create_index,manage,all]"
        }
      ],
      "type": "security_exception",
      "reason": "action [indices:admin/create] is unauthorized for user [kibana_system_user] with roles [kibana_system] on indices [.alerts-security.alerts-default-000001], this action is granted by the index privileges [create_index,manage,all]"
    },
    "status": 403
  },
  "statusCode": 403,
  "headers": {
    "content-length": "577",
    "content-type": "application/json;charset=utf-8",
    "x-cloud-request-id": "QuceUv6KRh6-zCcLjQn7ng",
    "x-found-handling-cluster": "b5caf8c576704714a9bb2559bddab987",
    "x-found-handling-instance": "instance-0000000028",
    "date": "Tue, 25 May 2021 17:34:33 GMT"
  },
  "meta": {
    "context": null,
    "request": {
      "params": {
        "method": "PUT",
        "path": "/.alerts-security.alerts-default-000001",
        "body": "{\"aliases\":{\".alerts-security.alerts-default\":{\"is_write_index\":true}}}",
        "querystring": "",
        "headers": {
          "user-agent": "elasticsearch-js/8.0.0-canary.4 (darwin 20.3.0-x64; Node.js v14.17.0)",
          "x-elastic-product-origin": "kibana",
          "x-elastic-client-meta": "es=8.0.0p,js=14.17.0,t=8.0.0p,hc=14.17.0",
          "content-type": "application/json",
          "content-length": "71"
        },
        "timeout": 30000
      },
      "options": {},
      "id": 66
    },
    "name": "elasticsearch-js",
    "connection": {
      "url": "https://b5caf8c576704714a9bb2559bddab987.us-east-1.aws.staging.foundit.no:9243/",
      "id": "https://b5caf8c576704714a9bb2559bddab987.us-east-1.aws.staging.foundit.no:9243/",
      "headers": {},
      "deadCount": 0,
      "resurrectTimeout": 0,
      "_openRequests": 0,
      "status": "alive",
      "roles": {
        "master": true,
        "data": true,
        "ingest": true,
        "ml": false
      }
    },
    "attempts": 0,
    "aborted": false
  }
}
server    log   [12:34:33.914] [error][IndexBootstrapper][eventLog][plugins][ruleRegistry] error bootstrapping elasticsearch resources: Error creating initial index: security_exception
server    log   [12:34:33.914] [error][plugins][ruleRegistry] Error: Event log bootstrapping failed, logName="security.alerts"
    at EventLogProvider.getLog (/Users/smith/Code/kibana-review/x-pack/plugins/rule_registry/server/event_log/log/event_log_provider.ts:68:15)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at testImplementation (/Users/smith/Code/kibana-review/x-pack/plugins/rule_registry/server/event_log/test_implementation.ts:72:3)
    at /Users/smith/Code/kibana-review/x-pack/plugins/rule_registry/server/event_log/test_implementation.ts:42:7

I'm connected to the edge observability cluster so it might have something to do with pre-existing configuration there.

Setting xpack.ruleRegistry.index in kibana.dev.yml works, but I'm not sure what's going on here and if this is something that needs to be fixed in this PR so let me know.

this.pagination = { page: 1, perPage: 20 };
}

public filterByLogger(loggerName: string): IEventQueryBuilder<TEvent> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I enjoy using these kind of builder pattern things, but it seems implementing it as part of this PR in this particular plugin adds a lot more than is needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just decoupled this builder from .search(), so now, if you don't need it or would prefer not using it, you can call .search() directly without messing up with the builder:

export interface IEventLog<TEvent> extends IEventLoggerTemplate<TEvent> {
  getNames(): IndexNames;

  getQueryBuilder(): IEventQueryBuilder<TEvent>;

  search<TDocument = TEvent>(
    request: estypes.SearchRequest
  ): Promise<estypes.SearchResponse<TDocument>>;
}

});

// TODO: remove before merge
testEventLogImplementation(this.eventLogService, this.logger);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the intent of this is. Could it be written instead as an integration test that we can keep around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to remove this code before merging the PR, just like the comment says :)
We will definitely need to cover this plugin with tests but this is out of scope of this PR. I think we could do it after converging the 2 implementations.

@banderror banderror force-pushed the event-log-for-rule-registry branch from f9ee1cf to d574302 Compare May 26, 2021 08:31
@banderror
Copy link
Contributor Author

banderror commented May 26, 2021

@smith

I'm getting this error when starting Kibana in dev:
I'm connected to the edge observability cluster so it might have something to do with pre-existing configuration there.

It seems you've caught an error during execution of testEventLogImplementation which attempts to bootstrap 2 indices on behalf of internal user of Kibana. It says you have kibana_system_user with role kibana_system.

"reason": "action [indices:admin/create] is unauthorized for user [kibana_system_user] with roles [kibana_system] on indices [.alerts-security.alerts-default-000001], this action is granted by the index privileges [create_index,manage,all]"

This pre-built kibana_system role should be sufficient and should work since this change in Elasticsearch: elastic/elasticsearch#72181 . I'd guess maybe your observability cluster is not yet up-to-date with this change. Could you try checking it with a local Elasticsearch instance?

elasticsearch:
  username: 'kibana_system'
  password: 'changeme'
  hosts: 'http://localhost:9200'

Otherwise you'd need to give your user in the remote cluster the required privileges manually: all privileges to .alerts* and .siem-signals* indices.

@banderror banderror requested review from xcrzx and smith May 26, 2021 17:19
Copy link
Contributor

@xcrzx xcrzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you for the changes!
I left a super minor comment regarding a leftover module, but it is certainly not a blocker. Giving my approval.

signal(value: T): void;
}

export function createReadySignal<T>(): ReadySignal<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is not needed anymore after the latest changes?

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
ruleRegistry 39 41 +2

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
ruleRegistry 5 6 +1
Unknown metric groups

API count

id before after diff
ruleRegistry 39 41 +2

References to deprecated APIs

id before after diff
canvas 29 25 -4
crossClusterReplication 8 6 -2
fleet 22 20 -2
globalSearch 4 2 -2
indexManagement 12 7 -5
infra 261 149 -112
lens 67 45 -22
licensing 18 15 -3
maps 286 208 -78
ml 121 115 -6
monitoring 109 56 -53
securitySolution 386 342 -44
stackAlerts 101 95 -6
total -339

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @banderror

@banderror banderror merged commit 7fd6539 into elastic:master May 27, 2021
@banderror banderror deleted the event-log-for-rule-registry branch May 27, 2021 15:28
banderror added a commit to banderror/kibana that referenced this pull request May 27, 2021
**Needed for:** rule execution log for Security elastic#94143
**Related to:**

- alerts-as-data: elastic#93728, elastic#93729, elastic#93730
- RFC for index naming elastic#98912

## Summary

This PR adds a mechanism for writing to / reading from / bootstrapping indices for RAC project into the `rule_registry` plugin. Particularly, indices for alerts-as-data and rule execution events. This implementation is similar to existing implementations like `event_log` plugin (see elastic#98353 (comment) for historical perspective), but we're going to converge all of them into 1 or 2 implementations. At least we should have a single one in `rule_registry` itself.

In this PR I tried to incorporate most of the feedback received in the RFC (elastic#98912), but if you notice I missed/forgot something, please let me know in the comments.

Done in this PR:

- [x] Schema-agnostic APIs for working with Elasticsearch.
- [x] Schema-aware log definition and bootstrapping API (creating hierarchical logs).
- [x] Schema-aware write API (logging events).
- [x] Schema-aware read API (searching logs, filtering, sorting, pagination, aggregation).
- [x] Support for Kibana spaces, space-aware index bootstrapping (either at rule creation or rule execution time).

As for reviewing this PR, perhaps it might be easier to start with:

- checking description of elastic#98912
- checking usage examples https://github.com/elastic/kibana/pull/98353/files#diff-c049ff2198cc69bd50a69e92d29e88da7e10b9a152bdaceaf3d41826e712c12b
- checking public api https://github.com/elastic/kibana/pull/98353/files#diff-8e9ef0dbcbc60b1861d492a03865b2ae76a56ec38ada61898c991d3a74bd6268

## Next steps

Next steps towards rule execution log in Security (elastic#94143):

- define actual schema for rule execution events
- inject instance of rule execution log into Security rule executors and route handlers
- implement actual execution logging in rule executors
- update route handlers to start fetching execution events and metrics from the log instead of custom saved objects

Next steps in the context of RAC and unified implementation:

- converge this implementation with `RuleDataService` implementation
  - implement robust index bootstrapping
  - reconsider using FieldMap as a generic type parameter
  - implement validation for documents being indexed
- cover the final implementation with tests
- write comprehensive docs: update plugin README, add JSDoc comments to all public interfaces
banderror added a commit that referenced this pull request May 27, 2021
**Needed for:** rule execution log for Security #94143
**Related to:**

- alerts-as-data: #93728, #93729, #93730
- RFC for index naming #98912

## Summary

This PR adds a mechanism for writing to / reading from / bootstrapping indices for RAC project into the `rule_registry` plugin. Particularly, indices for alerts-as-data and rule execution events. This implementation is similar to existing implementations like `event_log` plugin (see #98353 (comment) for historical perspective), but we're going to converge all of them into 1 or 2 implementations. At least we should have a single one in `rule_registry` itself.

In this PR I tried to incorporate most of the feedback received in the RFC (#98912), but if you notice I missed/forgot something, please let me know in the comments.

Done in this PR:

- [x] Schema-agnostic APIs for working with Elasticsearch.
- [x] Schema-aware log definition and bootstrapping API (creating hierarchical logs).
- [x] Schema-aware write API (logging events).
- [x] Schema-aware read API (searching logs, filtering, sorting, pagination, aggregation).
- [x] Support for Kibana spaces, space-aware index bootstrapping (either at rule creation or rule execution time).

As for reviewing this PR, perhaps it might be easier to start with:

- checking description of #98912
- checking usage examples https://github.com/elastic/kibana/pull/98353/files#diff-c049ff2198cc69bd50a69e92d29e88da7e10b9a152bdaceaf3d41826e712c12b
- checking public api https://github.com/elastic/kibana/pull/98353/files#diff-8e9ef0dbcbc60b1861d492a03865b2ae76a56ec38ada61898c991d3a74bd6268

## Next steps

Next steps towards rule execution log in Security (#94143):

- define actual schema for rule execution events
- inject instance of rule execution log into Security rule executors and route handlers
- implement actual execution logging in rule executors
- update route handlers to start fetching execution events and metrics from the log instead of custom saved objects

Next steps in the context of RAC and unified implementation:

- converge this implementation with `RuleDataService` implementation
  - implement robust index bootstrapping
  - reconsider using FieldMap as a generic type parameter
  - implement validation for documents being indexed
- cover the final implementation with tests
- write comprehensive docs: update plugin README, add JSDoc comments to all public interfaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team:APM All issues that need APM UI Team support Team:Detections and Resp Security Detection Response Team Theme: rac label obsolete v7.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.